LIMITING DISTRIBUTIONS FOR THE NUMBER OF INVERSIONS IN 

LABELLED TREE FAMILIES 



ALOIS PANHOLZER AND GEORG SEITZ 



Abstract. We consider so-called simple families of labelled trees, which contain, e.g., ordered, 
unordered, binary and cyclic labelled trees as special instances, and study the global and local 
behaviour of the number of inversions. In particular we obtain limiting distribution results for 
the total number of inversions as well as the number of inversions induced by the node labelled 
j in a random tree of size n. 



1. Introduction 

Throughout this paper we always consider rooted trees T in which the vertices are labelled 
with distinct integers of {1, . . . , |T|}, where \T\ is the size (i.e., the number of vertices) of T. An 
inversion in a tree T is a pair of vertices (we may always identify a vertex with its label), 
such that i > j and i lies on the unique path from the root node root(T) of T to j (thus i is an 
ascendant of j or, equivalently, j is a descendant of i). Let us denote by inv(T) the number of 
inversions in T. 

In (6j[TT] studies concerning the number of inversions in some important combinatorial tree 
families T have been given by introducing so-called tree inversion polynomials. They shall be 
defined as follows^ 

JrM):= £ r viT) - 

T&T:\T\=n 

Actually, unlike in our studies, in (6, 11 the authors exclusively considered trees with the root 



node labelled 1. Thus, in order to avoid confusion, we introduce also the slightly modified 
polynomials J n {q) '■= ^2 TeT: q mv ^ T \ For unordered trees, i.e., trees, where one 

\T\=n and root(T)=l 

assumes that to each vertex there is attached a (possibly empty) set of children (thus there is no 
left-to- right ordering of the children of any node), Mallows and Riordan |11| could give an explicit 
formula for a suitable generating function of the corresponding tree inversion polynomials: 

exp fE^-^^^S^E^S- 

\n>l ' J n>0 

Gessel et al. [6] considered J n (q) for three other tree families: 

• Ordered trees: one assumes that to each vertex there is attached a (possibly empty) 
sequence of children (thus there is a left-to-right ordering of the children of each node). 

• Cyclic trees: ordered trees, where one assumes that cyclic rearrangements of the subtrees 
of any node give the same tree. 

• Plane trees: ordered trees, where one assumes that cyclic rearrangements of the subtrees 
of the root node give the same tree. 

Unlike for unordered trees, no explicit formulas for a suitable generating function of the tree 
inversion polynomial of the latter tree families could be given, but the authors provide exact and 
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asymptotic results for the evaluations of J n (q) for the specific values q = 0, 1, —1. In particular, 
J n (0) enumerates so-called increasing trees, i.e., trees, where each child node has a label larger 
than its parent node. 

Besides these studies it seems natural to ask, for a given combinatorial family T of trees, 
questions about the "typical behaviour" of the number of inversions in a tree T S T of size n. 
In a probabilistic setting we may introduce a random variable I n , which counts the number of 
inversions of a random tree of size n, i.e., a tree chosen uniformly at random from all trees of the 
family T of size n. Of course, this more probabilistic point of view and the before-mentioned 
combinatorial approach are closely related. Let us denote by T n the number of trees of T of 
size n. Then it holds 



Jn(q) = T n Y j nin = k}q k , 



k>0 

i.e., the probability generating function p n (q) := X^fc>o ^{^n = k}q k of the random variable I n is 

simply given by p n (q) = t ' anc ^ ^ holds T n ^ = T n F{I n = k} for the number T n ^ := [q k ] J n {q) 
of trees of size n with exactly k inversions. 

A main concern of this paper is to describe the asymptotic behaviour of the random variable 
I n for various important tree families by proving limiting distribution results. In our studies 
of /„ we use as tree-models so-called simply generated tree families (5 , 12 , which contain many 



important combinatorial tree families, such as the before-mentioned unordered trees, ordered 
trees, and cyclic trees, but also others such as, e.g., binary trees, t-ary trees, and Motzkin trees, 
as special instances. Simply generated trees are weighted ordered trees, where, given a degree- 
weight sequence, each node gets a weight according to its out-degree, i.e., the number of its 



children (see Subsection 2.1 for a precise definition); we remark that in probability theory such 
tree models are known as Galton- Watson trees. As a main result we can show, provided the 
degree-weight sequence satisfies certain mild growth conditions (which are all satisfied for the 

3 

before-mentioned tree families), that, after a suitable normalization of order nz, I n converges in 



distribution to a distribution known as the Airy distribution (see Subsection 3.2). We remark 



that the Airy distribution also appears in enumerative studies of other combinatorial objects 



such as, e.g., the area below lattice paths [9], the area of staircase polygons 14 , sums of 
parking functions [7] , and the costs of linear probing hashing algorithms [4] . For the particular 
tree family of unordered trees this limiting distribution result for I n has been shown already 
by Flajolet et al. in |4] during their analysis of a linear probing hashing algorithm by using 
close relations between the insertion costs of this algorithm and the number of inversions in 
unordered trees. We note that we show convergence in distribution, thus obtaining asymptotic 

3 

results for ¥{I n < xn^ \ or alternatively for the sums V 3 T n with x G M + , but we do not 

k<xn? ' 

obtain local limit laws, i.e., results concerning the behaviour of the probabilities P{/ n = k} or 
the numbers T n k itself. 

Besides this "global study" of the number of inversions in a random tree we are additionally 
interested in the contribution to this quantity induced by a specific label j, i.e., in a "local 
study". To do this we introduce random variables I n j, which count the number of inversions 
of the kind (i,j), with i > j an ancestor of j, in a random tree of size n. Of course, one 
could also introduce "local inversion polynomials" J n ,j(q) '■= T n X^fc>o^{^«J = k}q ■ Note that 
I n = Y^=i In,j, but the random variables I n> j are highly dependent. In our studies we describe 
the asymptotic behaviour of the random variable I n j, depending on the growth of j = j{n) with 
respect to n. In particular, we obtain that for the main portion of labels, i.e., for j ^ n—^/n, I n j 
converges, after suitable normalization of order y/n, in distribution to a Rayleigh distribution. 
We remark that the Rayleigh distribution also appears frequently when studying combinatorial 
objects, see, e.g., (si. If n — j ~ p^/n or n — j = o(y / n) then the behaviour changes. Apart 
from asymptotic results, we can for two particular tree families, namely ordered and unordered 
trees, also give explicit formulas for the probabilities ¥{I ni j = k}. An example of a labelled tree 
and the parameters studied is given in Figure [T] 



Figure 1. A binary labelled tree of size 7 with a total number of 6 inversions, 
namely (3,1), (6,1), (3,2), (6,2), (6,4), (7,5). Thus two inversions each are 
induced by the nodes 1 and 2, whereas one inversion each is induced by the 
nodes 4 and 5. 

We remark that the asymptotic results obtained for inversions in trees are completely different 
from the corresponding ones for permutations of a set {1,2, ...,n}. It is well-known (see, 
e.g., [To] ) that the total number of inversions in a random permutation of size n is asymptotically 
normal distributed with expectation and variance of order n 2 and n 3 , respectively. Trivially, 
the number of inversions of the kind with i > j an element to the left of j, in a random 

permutation of size n is uniformly distributed on {0, 1, . . . , n — j}. 

The plan of this paper is as follows. In Section [2] we collect definitions and known results 
about simply generated tree families, whereas in Section [3] we state the main results of this 
paper concerning the random variables I n and I n j. A proof of the results for I n and I n j is 
given in Section [4] and Section [5j respectively. 

Before continuing we define some notation used throughout this paper. The operator [z n ] 
extracts the coefficient of z n from a power series A(z) = )] n>() a n z n , i.e., [z n ]^4(z) = a n . For 

s G No, x- (resp. x s ) denotes the s-th falling (resp. rising) factorial of x, i.e., x- = x° = 1, 
and x- = x(x — 1) • • • (x — s + 1), x s = x(x + 1) ■ • ■ (x + s — 1), for s > 1. Furthermore, for 
each variable x we denote the differential operator with respect to x by D x , and we define two 
operators V (= V 9 ) and Z, which act on bivariate power series G(z,q) by VG(z,q) := G(z, 1), 
and ZG(z,q) := zG(z,q) (analogous definitions for multivariate power series). Moreover, if X 

and X n , n > 1, are random variables, then X n -^K X denotes the weak convergence (i.e., the 
convergence in distribution) of the sequence (X„) n >i to X. 

2. Labelled families of simply generated trees and auxiliary results 

Families of simply generated trees were introduced by Meir and Moon in [l2j. As mentioned 
before, many important combinatorial tree families such as, e.g., labelled unordered trees (also 
called Cayley trees), binary trees, labelled cyclic trees (also called mobile trees) and ordered 
trees (also called planted plane trees) , can be considered as special instances of simply generated 
trees. 

We now recall how simply generated tree families are defined in the labelled context, and 
then collect some well-known auxiliary results. Note that in the following the term "tree" will 
always denote a labelled tree. 

2.1. Definitions. A class T of (labelled) simply generated trees is defined in the following 
way: One chooses a sequence ((pi)i>o (the so-called degree-weight sequence) of nonnegative real 
numbers with ipo > 0. Using this sequence, the weight w(T) of each ordered tree (i.e., each 
rooted tree, in which the children of each node are ordered from left to right) is defined by 
w(T) := n^eT Vdiv)-, where by v £ T we mean that v is a vertex of T and d(v) denotes the 
number of children of v (i.e., the out-degree of v). The family T associated to the degree- weight 
sequence (<Pi)i>o then consists of all trees T (or all trees T with w(T) ^ 0) together with their 
weights. 
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We let T n := J2\T\=n w (T) denote the the total weight of all trees of size n in 7~, and define 
by T(z) := T n ^ its exponential generating function. Then it follows that T(z) satisfies the 
(formal) functional equation 

T(z)=z<p(T(z)), (1) 

where the degree-weight generating function tp(t) is defined via ip(t) := Yl^ofe^- 

We want to remark that each simply generated tree family T can also be defined by a formal 
equation of the form 

T=0*if(T), (2) 

where O denotes a node, * is the combinatorial product of labelled objects, and <£>(T) is a certain 
substituted structure (see, e.g., [5j). Hence, the functional equation ([I]) can be obtained directly 
from the combinatorial construction of T using the symbolic method (cf. [S]). Furthermore, 
{T n )n>i is for many important simply generated tree families a sequence of natural numbers, 
and then the total weight T n can be interpreted as the number of trees of size n in T ■ We now 
give several examples where this is the case. 

Examples: 

• Binary trees can be defined combinatorially as follows: 

T = 0*(MuT)*({n}uT). 

Here, □ denotes an empty subtree and U is the disjoint union. This formal equation 
expresses that each binary tree consists of a root node and a left and a right subtree, 
each of which is either a binary tree or empty. The formal equation for T can directly 
be translated into a functional equation for T(z), namely 

T{z) = z{l + T{z)f. 

Hence, binary trees are the simply generated tree family defined by cp(t) = (1 + 1) 2 , i.e., 
by the degree- weight sequence <pi= (^) , £ > 0. 

• Ordered trees are rooted trees, in which the children of each node are ordered. Thus, 
combinatorially speaking, each ordered tree consists of a root node and a sequence of 
ordered trees, 

t = o*seq(t) = o*(Mutut 2 ut 3 u ...). 

From this one gets the functional equation 

z 



T(z) 



1 - T(z) 



i.e., ip(t) = j3£. Of course, this corresponds to the degree- weight sequence ipi = 1, 
£ > 0. 

Unordered trees are rooted trees in which there is no order on the children of any node. 
Hence, each unordered tree consists of a root node and a set of unordered trees, which 
can be written formally as 

T = O * Set(T) = o * ({□} U TO ^ ^ U . . 
This leads to the functional equation 

T(z) = zexp(T(z)), 
i.e., one has ip(t) = exp(t), or equivalently tp£ = l/£\, £ > 0. 

Cyclic trees may be considered as equivalence classes of ordered trees, where cyclic 
rearrangements of the subtrees of nodes lead to a tree of the same class. Hence, each 
cyclic tree is either a single root node or it consists of a root node and a (non-empty) 
cycle of unordered trees, which can be written formally as 

T = O U O * Cyc(T) = O * ({□} U Cyc(T)) . 



This leads to the functional equation 

T(z) = z I 1 + log ' ' 



.1-^), 

i.e., one has <p{t) = 1 + log [j—j) , or equivalently tpo = 1 and ipi = l/£, £ > 1. 

We remark that plane trees as considered in j|6| are not covered by the definition of simply 
generated trees. However, since every subtree of the root node of a plane tree is an ordered 
tree, the methods applied in this work for a study of the number of inversions can be adapted 
easily to treat also this tree family, which leads to the same limiting distribution results as for 
ordered trees. Thus we omit computations for this tree family. 



2.2. Auxiliary results. We now collect some known results (see, e.g., [5j[T3j) on the function 
T{z) satisfying Q. First note that in general T(z) and ip(t) must be regarded as formal power 
series, because they do not need to have a positive radius of convergence, and then must be 
understood as a formal equation. Thus, in order to analyze properties of simply generated tree 
families by analytic methods, we will need to make certain assumptions on ip. In particular, we 
will assume that <p(t) has a positive radius of convergence R, and that there exists a minimal 
positive solution r < R of the equation 

*ff{t) = <p(t). (3) 

If we define 

d := gcd{e : Vt > 0}, (4) 

it then follows that ^ has exactly d solutions of smallest modulus, which are given by Tj = uj^t, 
for < j < d — 1, where uj = exp(^p). From the implicit function theorem it follows that the 
equation z = is not invertible in any neighbourhood of t = Tj, for < j < d— 1. This leads 
to d dominant singularities of T{z) at z = pj, where pj = uj j p, p = ^^y- 

For our purpose, it is important to note that under the above assumptions, T{z) is amenable 
to singularity analysis (cf. [3]), i.e., there are constants r] > and < <p < tt/2 such that T(z) 
is analytic in the domain {z G C : \z\ < p + rj, z 7^ pj, |Arg(z — pj)\ > <fi, for all < j < d — 1}. 
The local expansion of T(z) around the singularity z = pj is given by 



Using singularity analysis and summing up the contributions of the d dominant singularities, 
one obtains 

K = mz)= *£& (l + (I)Y (6) 

for n = lmodd. If n ^ lmodd, one has of course T n = 0, because in this case each ordered 
tree of size n has weight zero. 

In our analysis, we will further make use of the functions f^ m \T(z)) (where (p^ m \t) is the 
m-th derivative of (p{t)). Each of these functions has d dominant singularities at z = pj, 
< J ' < d— 1, and complies with the requirements for singularity analysis. Around z = pj, one 
has the expansion 



^ m \T{z)) = <pM( Ti ) - ^Hr^J-^ + O ( Pj - z) , (7) 

V pv 3 {') v Pj 



and we will especially make use of the expansion 



z<f/(T(z)) = 1 - ^pr^'iT) ll-- + O ( Pj - z) 

V Pj 



(8) 



3. Parameters studied and results 



3.1. Parameters studied. Consider a simply generated tree family T associated to a degree- 
weight sequence ((pe)e>o- In our analysis of parameters in trees of T we will always use the 
"random tree model for weighted trees", i.e., when speaking about a random tree of size n we 
assume that each tree T in T of size n is chosen with a probability proportional to its weight 
w{T). 

The main quantities of interest are the random variable I n , which counts the total number 
of inversions of a random simply generated tree of size n, and the random variable I n j, which 
counts the number of inversions of the kind with i > j an ancestor of j, in a random 

simply generated tree of size n. 

We mention the relation to a suitably adapted tree inversion polynomial for weighted tree 
families * 

J n (q) := J2 w{T ) ■ 9 iDV(T) = T n E P { J " = k ^ k - 

T£T:\T\=n k>0 

3.2. Auxiliary results for probability distributions. We collect some basic facts about 
two important probability distributions appearing later in our analysis. 

Definition 3.1. The Airy distribution is the distribution of a random variable I with r-th 
moments _ 

2x/7T 



where the constants C r can inductively be defined by 

2C r = (3r-4)rC7 r _ 1 + ^[ r ]c7,C r _ j , r > 2, d = - (9) 

j=l W 

Since we will use the method of moments in order to establish our results, the following 
well-known result about the Airy distribution is important (see, e.g., [2j, where one can find 
more details about the Airy distribution and some equivalent definitions). 

Lemma 3.2. The Airy distribution is uniquely determined by its sequence of moments (// r )r>l- 

Definition 3.3. The Rayleigh distribution with parameter a > is the distribution of a random 
variable X a with probability density function 

f*(x) = ^e-£, x>0. (10) 

The following basic fact about the Rayleigh distribution will be required in our analysis. 

Lemma 3.4. The Rayleigh distribution is uniquely determined by its sequence of r-th moments 
(/v)r>i> which are given as follows: 

f, r :=E(X:) = a r 2^T{l+ r -). (11) 

3.3. Results. Let T be the labelled family of simply generated trees associated to a degree- 
weight sequence (<pe)e>o, where the function ip(t) := X^>o ft? nas positive radius of convergence 
R, and equation Q has a minimal positive solution r < R. Furthermore, let p = (recall 
the definitions of Subsection 2.2). Then the following holds: 

Theorem 3.5 (Global behaviour). The random variable I n , which counts the total number of 
inversions in a random tree of size n of T is, after proper normalization, asymptotically Airy 
distributed: 

It holds that E (I n ) ~ c^Jlsn^l 2 , where c v = , 1 =, and 

In (d) 



c^n 3 / 2 



where I is an Airy distributed random variable. 
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Theorem 3.6 (Local behaviour). The random variable I n j, which counts the number of in- 
versions of the kind (i,j), with i > j an ancestor of j, in a random tree of size n of T has, 
depending on the growth of 1 < j = j(n) < n, the following asymptotic behaviour. 

• Region n — j 3> yjn: I n j is, after proper normalization, asymptotically Rayleigh dis- 
tributed: 

V™ (d) 
n — j f ' 

where X a is a Rayleigh distributed random variable with parameter a : 



Region n — j ~ oiyjn, with a G M + : I n j converges in distribution to a discrete random 



variable Yy, with 

v k />oo 2 



P{Y 7 = k} = jj / x k+1 e~^^ x dx, k>0, 
k- Jo 

and 7 := 



yj ' pTip"(T) 

• Region n — j <C \fn: I n j converges in distribution to a random variable with all its mass 
concentrated at 0, i.e., I n j ^-K 0. 

3.4. Examples: Before we prove these results, we apply them to our example tree families: 

• Binary trees: From the equation 2t(t + 1) = t<p'(t) = (f(t) = (t + l) 2 we get the positive 
solution r = 1, and hence c^ = \ and a = \[2. Thus, if we let /„ denote the number 
of inversions in a random binary tree of size n, then converges in distribution to 
an Airy distributed random variable. Furthermore, for the number I n j of inversions in 
a random binary tree of size n induced by node j, it holds that ^jl n ,j converges, for 
n — j 3> -y/n, in distribution to a Rayleigh distributed random variable with parameter 
v/2- 

• Ordered trees: The equation Tjjtp = I^t yields r = ^, and further = | and cr = 77^- 
Hence, for the number I n of inversions in a random ordered tree of size n, it holds 
that 4^ is asymptotically Airy distributed. Furthermore, the normalized number of 

inversions ^0jl n ,j induced by node j, is, for n — j <C ^Jn, asymptotically Rayleigh 

distributed with parameter \j\[2. 

We further note that for ordered trees the exact distribution of I n j is given as follows 
(for 1 < j < n and < k < n — j): 

1 / £ \(2n - 2\(n - t - \\ 2n - 1 - 21 

{nJ } = (T-D (*&») Jh,-> U-^-fcA t )\ k )^ttt- < 12 > 

• Unordered trees: Here, one has r = 1 and thus c v = ^ and a = 1. This shows that 

^jf converges in distribution to an Airy distributed random variable and that ^jl n ,j 
converges, for n — j <^ ^Jn, in distribution to a Rayleigh distributed random variable 
with parameter 1. 

Also for unordered trees the exact distribution of I n j can be stated explicitly. It 
holds (for 1 < j < n and < k < n — j): 

£=n-j-k v J 7 v 7 

• Cyclic trees: The positive real solution of the equation ^ = 1 + log ^ is numerically 
given by t w 0.682155. One further gets c v = v ^ r ps 0.199325 and a = \/l — r « 
0.563776. Thus, - ^ n 3/2 converges in distribution to an Airy distributed random variable, 
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and ^jl n ,j converges, for n — j<^ y/n, in distribution to a Rayleigh distributed random 
variable with parameter a. 



4. Proof of the results concerning the global behaviour 



4.1. Short overview of the proof. We prove our result given in Theorem |3.5| by using the 
method of moments, i.e., we show that the moments of I n converge (after proper normalization) 
to the moments of the Airy distribution. Since this distribution is uniquely determined by 
its moments, the convergence result then follows directly from the theorem of Frechet and 
Shohat (8j. To start with, we do not study the random variable I n directly, but consider a 
closely related random variable I n . Using the tree decomposition as in [6], we then obtain a 
g-difference-differential equation for a suitably chosen generating function which encodes the 
distribution of I n . From this equation, we can "pump" the moments of I n using techniques 
from |4| and singularity analysis, and finally transfer our result to I n . 

4.2. Introduction of /„ and generating functions. We let T be the subset of T which 
consists exactly of those trees in which the root has label 1. Obviously, the total weight of trees 
of size n in T is then given by — . Also note that each tree in T has the nice property that the 
root is not part of any inversion. Hence, the total number of inversions can just be obtained 
by summing up the contributions of the individual subtrees of the root. This fact will later be 
useful when we translate a decomposition of the trees in T to generating functions. 

We let I n denote the number of inversions in a random tree of size n in T, where each element 
of / of size n is chosen with probability proportional to its weight. 
Furthermore, we introduce the generating function 



F(z,q) :=£X>{/n = *}|«Z* 



n>l k>0 



(14) 



Note that n\[z n q k ]F(z, q) = P |/ n = /c| ^ is the total weight of all trees of size n in T which 
contain exactly k inversions. Moreover, observe that VF(z,q) = F(z, 1) is just the exponential 
generating function of (^f ) n>1 > an d hence we have the relation 



ZD z VF(z,q) =T{z), 
which we will use frequently. We further introduce the functions 

f r (z) :=VD r q F(z,q), 



(15) 



which are generating functions of the factorial moments E ( (In)-) of I n , in the sense that 

T z 1 

n>l 



f r (z)=J2®(t 



n n\ 



(16) 



Clearly, we can recover the r-th factorial moment of I n from (16) by 



M/oW 

but as we will see later, it is more convenient to use 

[z n ]zf r {z) = [z n \zf' r {z) 
[z-]zf^z) [z-}T(z) ' 



E 



(i- 



(17) 



where the second equality follows from (15) 



4.3. The g-difference-differential equation for F(z,q). It turns out that F(z,q) satisfies 
a certain equation involving a g-difference operator H which is very similar to the one Flajolet, 
Poblete and Viola used in [4J in their analysis of linear probing hashing. In our case, we define 
Hby 

HG( M ) := G(z, q )-G(qz,q) _ 

Using this, we get: 

Lemma 4.1. The function F(z,q) defined by ( |14[ ) satisfies 

D z F(z,q) = cp(RF(z,q)). (18) 

Proof. This equation can be obtained by establishing mutually dependent recurrences for the 
total weights T n ,k : = P {I n = k} T n and T n ,k := P = /c| ^ of trees of size n with k inversions 

in T and T, respectively. Nevertheless, we confine ourselves to give a combinatorial argument 
at this point. 



In order to derive (18), we establish relations between T and T, which can be translated into 
functional equations for F(z,q) = Y. n >i ^2k>o^n,kQ k ^ and 

T(z,q) :=££W-. 

n>lfc>0 

For this purpose, we consider the sets T n and 7~ n , which contain exactly the trees of size 
n of T and T, respectively. Clearly, T n can be partitioned into n disjoint subsets T^ = 
TniTn ■ ■ ■ ,Tn , where Tn contains exactly those trees in which the root is labelled by j. 

(2) 

Now consider the bijective mapping between T n and Tn which is obtained by just switching 
the labels 1 and 2 in each tree, and leaving all other labels and the structure of each tree 
unchanged. Since this mapping does not alter the relative order of any pair of nodes except 

(2) 

(1,2), it clearly holds that each tree of T n with k inversions is mapped to a tree of Tn with 

k + 1 inversions. Repeating this argument, we see that each tree in T n with k inversions can 

(i) 

bijectively be mapped to a tree with k + j — 1 inversions in Tn ■ This leads for the generating 
functions F(z, q) and T(z, q) to the equation 

T(z, q) = Y,Yl f n,k (1 + • • • + q n ~ l ) q k ~ y = HF(z, q). (19) 

n>l k>0 1 _ ?n 

Next, remember that T is defined by the formal equation T = O * y (7~) , and that T consists 
exactly of those trees of T in which the root has label 1. It thus follows that / satisfies the 
formal equation 

f = G)x^(T). (20) 
Due to the observation that the root node (T) of any tree in T does not contribute to the number 



of inversions, equation (20) can be translated by an application of the symbolic method to the 
differential equation 

D z F(z,q) = <p(T(z,q)). 



Using equation (19), we thus obtain (18). □ 



4.4. Application of the pumping method. In order to extract expressions for the functions 



f r {z) as defined in (16) from (18), we use the pumping method from |4j. This method basically 
rests on the idea of applying the operator VD^ to the given functional equation involving H, 
and using a "commutation rule" for the operators VD^ and H. Since our operator H is slightly 
different from the one in E], we will first establish the suitable commutation rule for our case. 
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Lemma 4.2. The operator VD^H satisfies the operator equation 

VDjH = £ Q -L-Z^D?" ^Df s . (21) 

s=0 



Proof. Since all occurring operators are linear, it suffices to show that the two sides of the 
equation coincide when applied to a function of the form G(z,q) = q k z n . Remember that 
H(q k z n ) = q k (l + q-\ h q n ~ 1 )z r \ and thus we have 



n— 1 n— 1 j / .\ 

VD^H(g fc z") = z"^VD^V) = ^ n EEr 

i=0 i=0 s=0 

- *-eC>^e(;) --eOH"i) 

s=0 v 7 i=0 v 7 s=0 v 7 \ 1 / 

= t (:) ^^=f = t o ^z^d^vd^,'-"). 



s=0 v 7 s=0 



□ 



Using (21 ), we can now establish a recurrence for the derivatives f' r {z) of the factorial moment 
generating functions: 

Lemma 4.3. The factorial moment generating functions f r (z) satisfy, for r > I, 

m = i-z^nz))^ 1 



. r-l / i ™ / \ i \ fc " 

+ e ^- , ™> im e (:) ^-vr>'^ 

ffci L_i)eB r m=l \ s=0 v 7 



(22) 

where B\ := ; and 

5 r := {(fci,fe 2) ---,*!r-i) G N£ -1 : fci +2/c 2 + ... + (r - l)fc r _i = r} , 

/or r > 2. 



Proof. We apply VD^ to ( 18 ) and express D J q (p(RF(z, q)) using Faa di Bruno's formula for higher 
derivatives of composite functions, 

i _L_ /r) m n(n\ \ km 

■vw»»= e n (-^) . 

(fcl,...,A: r )eA r r m=l V 7 

where ^4 r := {(fci, k2, ■ ■ ■ , k r ) £ Nq : fei + 2/c 2 + . . . + rk r = r}. We then obtain the claimed 
result by applying Lemma |4~2j solving for f r (z) and using the fact that 



Vif(nF(z,q)) = <p(VKF(z,q)) = if (ZD z VF(z,q)) = <p(T(z)), 



which follows from (21) and (15). □ 



4.5. Singularity analysis. We now investigate the singular behaviour of the functions in (22) 



in order to compute the factorial moments E [Jnj asymptotically. In the following, we carry 

out only the computations for the case d = 1 (where d is defined by Q and thus gives the 
number of dominant singularities of the functions considered). The general case runs completely 
analogous: when applying singularity analysis, one just has to take care of the contributions of 
all d singularities and add them. 
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Lemma 14.31 we have 



In a first step, we want to find an asymptotic formula for the expected value E ( I n ) . From 



l-z<p>(T(z)) 



and using the fact that f({(z) = T' (z)(p' (T(z)) = ^i^^t^z))^ > wn i cn is easily obtained by 
differentiating ([!]) and (15), we get 



1 z^'{T{z))) 2 T{z) 

2 (1 - z<pf{T{zW 



Note that ztp'(T(z)) ^ 1 for \z\ < p, which can be seen by differentiating Q, and thus f[{z) 
inherits the dominant singularity at z = p from T(z) and (p'(T(z)). Using the expansions ^ 
and (Tsl) , we find 



l {l + 0{{p-z)^))\{l + 0{{p-z)^)) 
2 2pV(r)(l-|)(l + 0((p-z)V2)) 



1 



(23) 



W(r) 1 



i + o[( P 



a/2 



z — > p. 



By applying basic singularity analysis, this immediately yields 

1 



Ap n+l (p"{r) 

Now, using (17) and (|6]), we get the expected value 

[z n ]zf[(z) _ 



l + Oin- 1 ' 2 



EL 



7T 



[z n ]T(z) p\j 8 V (t) V "(t) 
c^n 3 / 2 (l + 0(n- l l 2 



^(l + O in- 1 ' 2 



where c lo is defined as in Theorem 3.5 



We will now consider f T (z) for general r. It turns out that all f' r {z) have a unique dominant 
singularity at z = p. The singular expansions around this point are given in the following 
lemma. 

Lemma 4.4. For r > 1, each f' r {z) has a unique dominant singularity at z = p, where the 
expansion 



zf r {z) 



<p(r) 



V(t) r 1 



(3r-l)/2 



\ + OU P -z 



,1/2 



(24) 



holds. Here, the constants C r are defined as in (|9|). 



Proof. One easily checks that in the case r = 1 equation ([24]) coincides with ( 23 ) . For r > 1 we 
proceed by induction, following the inductive definition pTof the constants C r . So let r > 1 
and assume that (24) holds for all functions fj(z) with 1 < j < r. By the rules for singular 
differentiation [lj we then also have the following singular expansions for the /c-th derivatives 

fj^ of the functions fj(z), for all k > 1 and 1 < j < r: 



zff\z) 



<p(r) 



fc-i 



(3j-3+2fc)/2 
11 



l + 0[(p-z 



(25) 



From this one concludes that the dominant contributions in ( 22 ) can only arise from the terms 
corresponding to t = 1 and s = 0, i.e., 



zf r {z) 



1 - z<p>(T(z)) 



y>{T{z)y-z 2 fU{z) 



+ E 

(fci,...,fc r _i)es r 
Now note that 



r-l 



(ki+...+k r - 



l] (T(z)) Yl 



m=l 



m! 



1 + (p - Z 



m=l 



(p _ z )(3r-(/ci+...+Av-i))/2 ^ ' 



hence the dominant terms in the remaining sum correspond to those (ki, . . . , k r -i) £ B r with 
k\ + . . . + k r -\ = 2, and we thus get 



zf'Jz) 



1 - zip'{T{z)) 



r-l 



s=l 



f's(z) ti- s (z) 



s! (r — s)\ 



l + 0({p-z 



Now, expanding the occurring functions using 0, (§ and ((25]), we obtain after some simplifi- 
cations 



*/;(*) 



y/2(mp"{T)J\ 



— rr 

2 * 



s=l 



2ip"(T) 



2</(t) 



P 1 



(3(r-l)+l)/2 



2 Cc * 2 Ct> 



(3s-1)/2+(3(t— «)-l)/2 



/ 



x[l + 0[(p-z 



'^ (3r - 4) r:£iP w - ('^(^-' i/] 



2C r 



29?" (r) 



(3r-l)/2 



1 + (p - 2 



Z — > p. 



□ 



Lemma 



4.4 



can now be used in order to compute the moments of I n asymptotically: 



Lemma 4.5. The random variable I n satisfies 

2 % /^c> 3r / 2 



EI, 



C r 1 + O in 



-1/2 



Proof. By singularity analysis, it follows from Lemma 4.4 that 

[z n ]zf' r {z) = 2c; 



2 ¥ /'(r)r(*^l) 
and together with (17) and ^ this shows 

[z n \zf r {z) 2^Fc> 3? -/ 2 



E 7: 



[z n ]T(z) 



r( 



3r-l* 



C r (l + 0( n- 1 ' 2 
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Using the relation between the factorial moments and the ordinary moments of a random 
variable Y: 

E(n=E{^W^ (26) 

with {^} the Stirling numbers of second kind, we obtain that E f^j = E {jn^j + ^E 
and hence we get the desired result. □ 

4.6. Transfer of the result to I n . We now transfer the result for I n to the random variable 
I n , which counts the total number of inversions in a random tree of size n of T ■ In fact, we 
prove that the moments of I n and I n coincide asymptotically: 

Lemma 4.6. The random variable I n satisfies 

E (ID = V r( 3g =1) CV (l + O (V 1 ^)) . ( 2 7) 



Proof. The relation between T and 7" (compare equation (19)) directly translates to the follow- 
ing relation between the moments of I n and I n : 

E (I r n ) = 1 (e (/;) + E ((/„, + If) + . . . + E ((J n + n - 1) 



From this, one easily deduces that E (/£) = E I Z£ J +0 I n 2 L and hence (27) follows directly 



3r-l 



from Lemma 14.51 □ 



3.2 



By comparing (27) with /x r in Definition 3.1 we conclude that the moments of the normalizec 
random variable - J| /2 converge to the moments of the Airy distribution. Due to Lemma 

the convergence in distribution of — to an Airy distributed random variable thus fol 
directly from the theorem of Frechet and Shohat. 



ows 



This finishes the proof of our result on the total number of inversions. 



5. Proof of the results concerning the local behaviour 



5.1. The generating functions approach. A main ingredient in the proof of Theorem 3.6 
concerning the behaviour of the random variable I n ,j is to introduce and study a suitable 
generating function for the probabilities ^{In,j = k}, which reflects in a simple way the recursive 
description of a tree as a root node and its subtrees. It turns out that the following trivariate 
generating function is appropriate: 

~j— 1 yVTi 

N(z,u,q) := ^ EE P iWi = fc } T ^773W^[^- ( 28 ) 

m>0j>lk>0 ^ '' 

Proposition 5.1. The generating function N(z,u,q) is given by the following explicit formula: 

1 - (z + uq)(p'(T{z + u)) 
Proof. We will show the functional equation 

N(z, u, q) = y{T(z + u)) + z<p'(T(z + u))N(z, u, q) + uqtp'(T(z + u))N(z,u,q), (29) 



which is equivalent to the statement of Proposition |5,1[ To do this we introduce specifically 
tricoloured trees: in each tree T G T exactly one node is coloured red, all nodes with a label 
smaller than the red node are coloured white, whereas all nodes with a label larger than the red 
node are coloured black. Let us denote by Tc the family of all such tricoloured trees. Then in 
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the generating function N(z, u, q) the variable z encodes the white nodes, the variable u encodes 
the black nodes, whereas q encodes the black ancestors of the red node, i.e., 



Jij, white _.fl black 

Since the black nodes as well as the white nodes are labelled it is appropriate to use a double 
exponential generating function. 

As auxiliary family we consider specifically bilabelled trees: the nodes in each tree T £ 7 
are coloured black and white in a way such that each white node has a label smaller than any 
black node (i.e., all nodes up to a certain label are coloured white, whereas all remaining nodes 
are coloured black). Let us denote by 7b the set of all such bicoloured trees. The double 
exponential generating function of bicoloured trees, 

2$ white y§ black 

B{Z ' U)= ^ TB W{TB \t white)! (« black)!' 
can be computed easily. It holds: 

n „n—m „.m n „n—m „,m 

b(z, U )= v v mt)V^ — .VEEr — E <n 

n>l{fS)T&T:\T\=n m=0 v ' n>0m=0 v ' TgT :\T\=n 

= VT n y^ - V -= y ^(z + u) n = T{z + u). (30) 

^ ^ (n-m)\m\ ^ nV ' v ; K ' 

n>0 m=0 y ' n>0(l) 

Now we consider the decomposition of a tricoloured tree Tq G Tc into the root node root(Tc) 
and its t > subtrees T%, . . . , Tg. Thus the degree- weight of the root node is given by tp^. Three 
cases may occur. 

(i) The root node is the red node. Then the red node does not have black ancestors and all 
of the subtrees Xi, . . . , are, after order preserving relabellings, specifically bicoloured 
trees, i.e., elements of 7b- 
(ii) The root node is a white node. Then the red node is contained in one of the i subtrees; 
let us assume it is T s . After an order preserving relabelling this subtree is itself an 
element of Tc , whereas all remaining subtrees are, after order preserving relabellings, 
elements of 7b- Moreover, the number of black ancestors of the red node in Tq is the 
same as the number of black ancestors of the red node in the subtree T s . 
{Hi) The root node is a black node. Again the red node is contained in of the I subtrees; let 
us assume it is T s . After an order preserving relabelling this subtree is an element of 
7c, whereas all remaining subtrees are, after order preserving relabellings, elements of 
7b- But in this case the number of black ancestors of the red node in Tc is one more 
than the number of black ancestors of the red node in the subtree T s . 



Considering all tricoloured trees of Tc and taking into account (30) the above decomposition 
leads to the stated equation p9[ ) for N(z,u,q): 



N(z,u,q) =YjVl{T{z + u)Y + zy j lip l (T{z + u)y 1 N{z,u, 
+ uqY^l^ i {T{z + u)Y' 1 N{ 



£>0 

<p(T(z + u)) + z(p'(T(z + u))N(z, u, q) + uqif'(T(z + u))N(z, u, q). 

□ 
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5.2. Computing the factorial moments. Starting with the explicit formula for the trivari- 
ate generating function N(z, u, q) given in Proposition |5.1| we will compute the r-th factorial 



moments of I n j. According to the definition (28) of N(z,u,q) one obtains: 

E(iy = U ~ iy j; n ~ j)] W-\ n ->]W q N(z,u, q ) 

J- n 

_ (j-l)\(n-j)\r\. , 1 „_, n (<f/(T(z + u))) r <p(T(z + u)) 



T n (1-(Z + U)<P'(T(Z + U))) r+1 ' 

Since for any power series g(x) it holds: 

one further obtains the following expression, which will be the starting point for our asymptotic 
considerations: 

E(J r } _ (j ~ 1)! (n -j)\r\(n-r- l\ ^ n _ r _ 1} (<p'(T(z))) V(T(z)) 

^ -"-n \ J — J- / (^1 — Z</ 

V TV 



V 3-1 J (l-zip'(T(z))) r+1 
(j 1)! (n - j)! r! - r - 1\ ( V(r(z))) r T( z 

i-i y z (i-^(T(z))) 



In order to evaluate E(J^ ■) asymptotically we use the local expansions ([5]) and ([8]) and apply 
singularity analysis. Again for simplicity in presentation we will only carry out the computations 
for the case that the functions involved have d = 1 dominant singularities (see Subsection 2.2); 
for d > 1 one just has to add the contributions of all these singularities. 

We obtain then (for r arbitrary, but fixed): 

, M (VGTW))'T W _ [ ^( 1 + °(y rT I)) r (- + (y rT I)) 



;i - z^{T(z))) r+1 (ywMyrrj+ o(i - ^y +l 



\z n ' 



(2prV9 // (T)) r ^ i (1 - ^y-T X ' V V P" (2pr^'(r))^p«r(^±i) 

Together with the asymptotic formula for T n given in ([6]), we obtain from (31) after simple 
computations: 

I 3 V — 1 

(n — j)\r\(n — r — 1)! y27r(/j"(r)n2rn^ 
(n-r- j)\n\^[r){2pTip''{r)) 1 ^ 1 T{^) 

(2prv9"(r))2r(^±i) n2 V V " 
Using the duplication formula for the Gamma-function: 

r(—n- + i) = ^, 

we obtain the following expansion of the r-th factorial moment of I n j, which holds uniformly 
for all 1 < j < n: 

r(j + l)2i (n- j) 
(pT(p"(T))2 n: 



^rr • (1 + OL/l^)) = - • (1 + 0(n-i)). 



5.3. Limiting distributions by applying the method of moments. The asymptotic be- 
haviour of the moments of I n> j depending on the growth of j = j(n) can be obtained easily from 
the uniform expansion ( 32 ) . An application of the method of moments shows then the limiting 



distribution results stated in Theorem 13. 61 
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5.3.1. Region n — j 3> yn. For this region it holds 

(n-j)- (n-j) r 



l + 0(n-s)), 



which implies the following expansion for the factorial moments: 

2ir(§ + l) (n-j* 



Wl 3 ) = , , 2 /^ ^ ■ (1 + OK*)). (33) 



Together with equation (26) connecting the factorial and the ordinary moments we obtain the 
following asymptotic expansion for the r-th moments of I n ,-: 



(pT(p"(r))2 n2 x ii 
Thus we obtain, for each r fixed and n —> oo: 



i.e., the moments of ^^I n ,j converge to the moments of a Rayleigh distributed random variable 



with parameter a = , . An application of the theorem of Frechet and Shohat shows then 



the corresponding limiting distribution result of Theorem 3.6 



,-)->• 2 a r =[ ; 25T(- + 1). (34) 



5.3.2. Region n — j ~ ayra, a G K . Also for this region the asymptotic expansion (33) of the 
r-th factorial moments computed above holds and one gets further: 

2§T(£+1) r _ / q yr n£ 
(pV(r))i " " VpV(t)- 
To continue we require the following lemma. 
Lemma 5.2. Ze£ Ky, with 7 > 0, 6e a discrete random variable with distribution 

P{Y 7 = k} = jj j x k+l e-^~^ x dx, fork>0. 







A:! 

Then it holds that the r-th factorial moments ofY^ are given as follows: 



E(Kf)= 7 r 2ir(^ + l). 
Moreover, the distribution of is uniquely defined by its sequence of moments. 

Proof. For r > we get (the case r = shows that the probabilities sum up to 1, i.e., they 
define indeed a distribution): 

v fc r°° . . ,2 r°o 2 . t^^k-r 



~,k poo 2 roo 2 /-.—Nre— r 

ECm = Y&l T x k+1 e-^^ x dx= e -T-TYx r+1 V W^dx 

= / e-^-~ 1x 1 r x r+l e~< x dx = 1 r x r+1 e-^dx = 2^Y u^e~ u du 
Jo Jo Jo 

= 7 r 2ir(^ + i). 

To show that the sequence of moments uniquely characterizes the distribution we consider 
the moment generating function F(s) := E(e sY ~<) of Y~. It can be shown easily that F(s) is 
given by the following expression: 

/•CO 2 

F(s) = VP{^ 7 = k}e ks = / xe'^-^dx, with p = 7(1 - e s ). 

Jo 



k>0 



Thus the moment generating function F(s) exists in a real neighbourhood of s = (actually it 
exists for all real s), which implies that the corresponding distribution is uniquely defined by 
its moments. □ 



Since the r-th factorial (and thus also ordinary) moments of I n j converge to the corresponding 



moments of Y~, with 7 



, an application of the theorem of Frechet and Shohat shows 



also for this case the limiting distribution results stated in Theorem 3.6 



5.3.3. Region n — j -C yfn. From (32) one easily gets that E(i^ ■) — > 0, for r > 1, which, by an 



application of the theorem of Frechet and Shohat shows I n j — > as stated in the corresponding 



part of Theorem 3.6 



5.4. Explicit formulas for probabilities. For some particular tree families it is possible to 
obtain explicit formulas for the probabilities ¥{I n j = k} by extracting coefficients from the 
trivariate generating function N(z,u,q) as given in (29). E.g., for ordered and unordered trees 
(<p(t) = ^—j. and (f(t) = e t , respectively) the generating function N(z,u,q) is given by the 
following expressions: 

1-T(z + u) 



N(z,u,q) 



N(z,u,q) 



(1 — T(z + u)) 2 — z — uq' 

e T(z+u) 

l-(z + uq)e T (*+u) ' 



with T(z) 
with T(z) 



(ordered trees), 

1 — 1 [z) 

ze T ( z ) (unordered trees). 



We omit here the necessary computations for extracting coefficients of N(z, u, q) in order to 
obtain the required probabilities via 

P{ j = k} = (A~ i)' [z j - X u n -iq k ]N{z,u,q) 



nl 



[z n \T{z) 



but we stated the corresponding results in Subsection 3.4 as formulas (12) and (13). 
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